4.13 Series数据合并对齐
1、Series合并(自身元素合并)
假如Series数据的每个元素是可迭代的数据(如列表、数据、Series等), 现在需要将其合并,可使用s.str.join()函数,结构如下:
s.str.join(sep)
sep :合并之间的分隔符
import pandas as pd ,numpy as np
s=pd.Series([
[ 98,"100","85" ],[ 63,"75" ],[ "96","41",9,"102" ],[ "ss","dd" ],
pd.Series([ "3","3433","343" ])]) # 如果不是字符类型,则会报”NaN“缺失值
# a=s.map(lambda l:pd.Series(l,dtype="str"))
a=s.map( lambda l:np.array(l,dtype= "str" )) #等同上一行
t=a.str.join( "-" )
print (t)
返回:
0 | 98-100-85 |
1 | 63-75 |
2 | 96-41-9-102 |
3 | ss-dd |
4 | 3-3433-343 |
dtype: object
2、Series数据接位置合并对齐
除了Series对自身可迭代序列元素的合并之外,还可以使用s.str.cat()函数与其他对象合并,但Series数据以及被合并的对象中的每个元素都必须保证是文本类型,s.str.cat()结构如下:
s.str.cat(others=None,sep=None,na_rep=None,join=”left”)
others :与Series合并的对象,如列表,数组、Series,DataFrame均可
sep :合并时的分隔符,默认为空
na_rep :将缺失值设置为指定的定符,如果不指定,others参数有缺失值将不会合并
join :合并时的联接样式,有left,rigth,outer,inner四种。
import pandas as pd
s=pd.Series([ "a","b","c" ])
t1= "-" .join(s)
t2=s.str.cat(sep= "-" )
t3=s.str.join( "-" )
t4=s.str.cat()
print (t1)
print (t2)
print (t3)
print (t4)
返回:
a-b-c
a-b-c
0 | a |
1 | b |
2 | c |
dtype: object
abc
import pandas as pd
s1=pd.Series([ "a","b","c" ])
s2=pd.Series([[ "a","b","c" ],[ "1","2","3" ]])
t1=s1.str.join( "-" )
t2=s2.str.join( "-" ) # 二维数组有迭代对象才能合并
t3=s2.str.join( "-" ).str.cat( sep = "@" )
print (t1)
print (t2)
print (t3)
返回:
0 | a |
1 | b |
2 | c |
dtype: object
0 | a-b-c |
1 | 1-2-3 |
dtype: object
a-b-c@1-2-3
import pandas as pd
s=pd.Series(
data =[ "张三","李四","王五" ],
index =[ "NDE01","EDN04","EDN05" ])
l=[ "39","40","45" ]
t=s.str.cat(l, "-" )
print (t)
返回:
NDE01 | 张三-39 |
EDN04 | 李四-40 |
EDN05 | 王五-45 |
dtype: object
import pandas as pd,numpy as np
arr=np.array([
[ "28","张三","财务部" ],
[ "34","李四","销售部" ],
[ "56","王五","开发部" ]
])
s=pd.Series(
data =[ "张三","李四","王五" ],
index =[ "NDE01","EDN04","EDN05" ])
t=s.str.cat(arr, "-" )
print (t)
返回:
NDE01 | 张三-28-张三-财务部 |
EDN04 | 李四-34-李四-销售部 |
EDN05 | 王五-56-王五-开发部 |
dtype: object
3、Series数据接索引对齐合并
import pandas as pd,numpy as np
s=pd.Series(
data =[ "张三","李四","王五" ],
index =[ "END01","END04","END03" ])
arr=np.array([
[ "28","张三","财务部" ],
[ "34","李四","销售部" ],
[ "56","王五","开发部" ]
])
df=pd.DataFrame(
data =arr,
index =[ "END09","END03","END01" ],
columns =[ "年龄","性别","部门" ]
)
t1=s.str.cat(df,sep= "-" ,na_rep= "None" ,join= "left" )
t2=s.str.cat(df,sep= "-" ,na_rep= "None" ,join= "right" )
t3=s.str.cat(df,sep= "-" ,na_rep= "None" ,join= "outer" )
t4=s.str.cat(df,sep= "-" ,na_rep= "None" ,join= "inner" )
print (t1)
print (t2)
print (t3)
print (t4)
返回:
END01 | 张三-56-王五-开发部 |
END04 | 李四-None-None-None |
END03 | 王五-34-李四-销售部 |
dtype: object
END09 | None-28-张三-财务部 |
END03 | 王五-34-李四-销售部 |
END01 | 张三-56-王五-开发部 |
dtype: object
END01 | 张三-56-王五-开发部 |
END03 | 王五-34-李四-销售部 |
END04 | 李四-None-None-None |
END09 | None-28-张三-财务部 |
dtype: object
END01 | 张三-56-王五-开发部 |
END03 | 王五-34-李四-销售部 |
dtype: object